Report for Data Mining Cup 2002 by :
نویسنده
چکیده
This research report is written for attendance of DMC 2002. (See [8]) It was written following the CRISP-DM (CRISPData Mining) [1] Methodology: Business understanding, Data Understanding, Data Preparation, Modeling and Evaluation. Two popular data mining software products are used: DISCOVERER is mainly used for data preparation and modeling. WEKA is used to feature selection. At the last part of the report, some personal intuitive understanding of real-world data mining is also given.
منابع مشابه
Evaluation of text data mining for database curation: lessons learned from the KDD Challenge Cup
MOTIVATION The biological literature is a major repository of knowledge. Many biological databases draw much of their content from a careful curation of this literature. However, as the volume of literature increases, the burden of curation increases. Text mining may provide useful tools to assist in the curation process. To date, the lack of standards has made it impossible to determine whethe...
متن کاملPredicting customer behaviour: The University of Melbourne's KDD Cup report
We discuss the challenges of the 2009 KDD Cup along with our ideas and methodologies for modelling the problem. The main stages included aggressive nonparametric feature selection, careful treatment of categorical variables and tuning a gradient boosting machine under Bernoulli loss with trees.
متن کاملRule - based Extraction of Experimental Evidence in the Biomedical Domain – the KDD Cup 2002 ( Task 1 )
Below we describe the winning system that we built for the KDD Cup 2002 Task 1 competition. Our system is a Rule-based Information Extraction (IE) system. It combines pattern matching, Natural Language Processing (NLP) tools, semantic constraints based on the domain and the specific task, and a post-processing stage for making the final curation decision based on the various evidence (positive ...
متن کاملUsing Data and Text Mining Techniques for Yeast Gene Regulation Prediction: A Case Study
We focus on the problem of predicting yeast gene regulation experiments. In order to construct a good solution, we study combinations of different methods that are not yet to be found in any single data mining application. We describe our approach to propositionalizing the given relational data that describes the interaction among proteins. We study how we can exploit a large archive of scienti...
متن کاملBennett Netflix 100 Winchester Circle
INTRODUCTION The KDD Cup is the oldest of the many data mining competitions that are now popular [1]. It is an integral part of the annual ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD). In 2007, the traditional KDD Cup competition was augmented with a workshop with a focus on the concurrently active Netflix Prize competition [2]. The KDD Cup itself in 2007 con...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2003